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Optimal  Grouping,  Spacing,  Stratification  and 
Piecewise  Constant  Approximation 

by 

R.  L.  Eubank 

Southern  Methodist  University 

1.  Introduction  and  Summary.  It  has  been  recognized  for  some 
time  that  there  is  a  structural  similarity  between  certain  problems 
of  optimal  grouping,  spacing,  and  stratification.  See,  for  example, 

Cox  (1957) .Kulldorff  (1958a, b,  1961),  Sarndal  (1961,  1962),  Ekman  (1969), 
Bofinger  (1975),  Biihler  and  Deutler  (1975)  and  Adatia  and  Chan  (1981). 

In  this  paper  the  underlying  relationship  between  these  and  other  prob¬ 
lems  is  established.  Specifically,  it  is  shown  that  all  these  problems, 
when  viewed  in  the  quantile  domain,  become  problems  of  optimal  knot 
(breakpoint)  selection  for  piecewise  constant  L2[0,1]  approximation. 

This  fact  allows  us  to  develop  a  unified  approach  to  all  such  problems 
that  includes  i)  conditions  for  existence  and  uniqueness  of  solutions 
ii)  a  computational  algorithm  and  iii)  simple  approximate  solutions. 

In  addition,  this  approach  provides  insight  into  the  geometry  of  and 
connection  between  these  problem  areas.  Questions  pertaining  to  the 
equivalence  of  certain  problems,  such  as  considered  by  Adatia  and 
Chan  (1981),  become  questions  regarding  the  equivalence  of  certain 
function  approximation  problems. 

In  the  next  section  we  examine  a  canonical  form  for  the  problems 
to  be  considered  and  establish  our  principal  results  concerning  its 
solution.  In  subsequent  sections  these  results  are  applied  to  various 
problems  of  optimal  stratification  and  grouping,  optimal  spacing  and 
grouping  and  some  bivariate  stratification  and  grouping  problems 


that  have  appeared  in  the  literature. 


2.  An  optimal  grouping  problem.  Let  X  be  a  random  variable 
with  strictly  increasing  distribution  function  (d.f.)  F  and  associated 
continuous  probability  density  function  (p.d.f.)  f  ■  F' .  Define  the 
quantile  function  (q.f.)  for  F  as  Q(u)  ■  F  ^(u),  0  <  u  <  1,  and 
density-quantile  function  by  f Q(u)  ■  f (Q(u) ) ,  0  _<  u  _<  1 .  Also  let 
a  ■  Xq  <  x^  <  ...  <  =  b  (where  we  allow  for  either  or  both  of 

a  «  -®,  b  =  »)  represent  a  partition  of  the  range  of  X  and  note  that 
the  set  of  percentile  points  associated  with  the  x^'s,  U  «  ^uo* *  *  * ,uk+l^’ 
is  uniquely  defined  by 
u„  »  0 


Q(ut) 


k+1 


x^,  i  ®  1». • • 


(2.1) 


1. 


The  probability  mass  corresponding  to  the  ith  interval  can  then  be 
written  as 

FCx^  -  FCx^)  *  u^  -  u±1  .  (2.2) 

Suppose  that  instead  of  X  the  object  of  interest  is  a  monotone 
increasing  transformation  T(X)  which,  for  convenience  of  presentation, 
is  discretized  to  obtain  a  new  variable 

Ty(X)  -  m±  ,  xi_1  <  x  <  xi  ,  (2.3) 

where  m^  is  the  conditional  mean  of  T(X)  on  the  ith  interval,  i.e., 

—  1  .1 

m.  -  (u.-u^)  /  1  T(x)f(x)dx  -  (uj-u^)”  /  1  TQ(u)du,  (2.4) 

Xi-1  Ui-1 

and  TQ(u)  «  T(Q(u)).  The  characteristics  of  T(X)  are  then  summarized 
by  (x^m^),  i  ■  l,...,k+l.  Observing  that  T(X)  and  T^(X)  will  have 


identical  expectations,  whose  common  value  may  be  taken  without  loss 
of  generality  as  zero,  the  "within  group  variance"  of  this  summariza- 
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tion  scheme  can  be  written  as 

1  ,  k+1  , 

V(T(X)  -  Ty(X) )  -  /  TQ(u)  au  -  <VUi-l)mi  *  (2.5) 

Since  this  variance  is  a  function  of  the  partition  or  grouping, 
the  x^’s.or  equivalently  U, should  be  chosen  to  minimize  (2.5). 
Therefore,  let  us  define  the  set  of  all  "k-point  spacings"  by 

Dk  ■  {(u0’  ul'-“’uk+l>:  0  -  u0  <  ux  <  ...  <  uk+1  -  1}  (2.6) 

and  consider  the  problem  of  selecting  a  U*eD^  that  satisfies 


V(T(X)  -  Ty*(X))  -  infUeD  V CT (X) -Ty (X) )  . 

k 

A  spacing  U*  satisfying  (2.7)  will  be  termed  an  optimal  spacing 


(2.7) 


It  should  be  emphasized  that  choosing  an  optimal  spacing  is  equivalent 
to  choosing  an  optimal  partition.  In  subsequent  work  we  will,  there¬ 
fore,  often  indicate  only  how  to  obtain  an  optimal  U  with  the  use  of 
(2.1)  to  obtain  the  corresponding  grouping  an  implied  next  step. 

Let  <•»•>  and  ||*|1  denote  the  usual  L2 [0,1]  inner  product  and 

2  2 

norm  and  note  that  (u^-u^  *  <TQ,B^>  where  is  the  ith  normalized 

B-spline  for  the  knot  sequence  u^,  i  =  l,...,k+l,  with 


i  (ui"ui-l) 
,‘w-{  0 


Ui-1  "  U  i  ui 
otherwise. 


Then,  as  <B^,Bj>  ■  5^,  we  have 

V(T(X)  -  Ty(X))  -  J1  TQ(u) 2du  -  I^J<TQ,B1>2 

-  | J TQ  -  Pv( TQ)||2 

where  Py  is  the  projection  operator  for  the  linear  span  of 
{B^:  i  ■  l,...,k+l).  Thus  minimizing  (2.5)  with  respect  to  U  is 


(2.8) 
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and 

3^  ■  -Vi)'1mv  -  "m1-  1  £  i  1  K  -  1.  (2.12) 

When  log  (TQ) '  is  concave,  it  follows  from  the  proof  of  Theorem  1 
that  the  Jacobian  is  diagonally  dominant  and  positive  definite  at  the 
optimal  spacing  so  that,  with  a  good  initial  guess,  Newton's  method 
will  find  the  optimal  solution.  A  discussion  of  uniqueness  conditions 
such  as  those  in  Theorem  1, as  well  as  the  algorithm  implied  by  (2.9)- 
(2.12)  that  is  phrased  in  a  regression  design  setting  can  be  found  in 
Eubank,  Smith,  and  Smith  (1981,  1982).  See  also  Barrow,  et  al  (1978) 
for  related  work. 

Frequently  for  complicated  TQ  functions  it  will  be  convenient 
to  use  the  approximate  (asymptotic)  solution  provided  by  the  next 
theorem  whose  proof  is  an  application  of  Theorem  1.1  of  Burchard 
and  Hale  (1975)  and  Theorem  U. A  of  Pence  and  Smith  (1982). 


Theorem  2.  Assume  that  TQeL2[0,l]  OC(0,1)  and  that  either 
i)  (TQ)'eC[0,l]  or  ii)  | (TQ) ' |  is  integrable  over  [a,g]  for  any 

9/0 

0  <  a  <  B  <  1  and  monotone  almost  everywhere  with  | (TQ) ' (u) | 


integrable.  Define  the  density 

h(u)  -  !(TQ)’(u)!2/3//1!(TQ)’(s)|2/3ds  (2.13) 

-1  0  1 

with  corresponding  q.f.,  H  ,  assumed  to  be  in  C  [0,1]  and  let  {Ufc} 

denote  the  spacing  sequence  whose  kth  element  is  V{0-H  W . ’l! 

Under  these  assumption 


lim  k2V(T(X)  -  T  (X))  -  lim  k2  inf.t__  V(T(X)  -  Tn(X)) 
k  k 

-  [ J1 I (TQ) ' (u) 1 2/3du]3/12.  (2.14) 

0 


Theorem  2  has  the  interpretation  that  the  within  group  variances 
corresponding  to  optimal  spacings  and  spacings  chosen  as  the  (k+1)- 
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tiles  of  h  have  identical  asymptotic  (as  k-*®)  behaviour  which 
suggests  that  a  computationally  expedient  solution  may  be  obtained 
by  using  the  partition  x^  -  Q(H  ^(i/k+1))  for  k  sufficiently  large. 
Alternative  conditions  on  h  (rather  than  H  that,  under  assumption 
i),also  imply  Theorem  2  are  given  in  Theorem  3.1  of  Sacks  and 
Ylvisaker  (1968). 

In  the  remainder  of  this  paper  it  will  be  seen  that  a  variety 
of  statistical  problems  can  be  formulated  as  in  this  section  and,  hence 
are  all  variable  knot  piecewise  constant  approximation  problems.  Con¬ 
sequently,  Theorems  1  and  2  furnish  a  unified  approach  that,  in  many 
cases,  provides  new  results  for  the  problem  areas  we  consider.  Connec¬ 
tions  with  the  work  of  others  will  be  discussed  in  the  appropriate 
sections.  However,  we  note  at  the  outset  that  the  conditions  imposed 
here  appear  to  be  weaker  than  those  employed  by  others  to  obtain 
comparable  results.  In  addition,  the  uniqueness  conditions  in  Theorem 
1  are  essentially  the  first  of  their  kind  for  most  of  the  problems  we 
examine.  This  is  of  particular  importance  in  view  of  their  implications 
for  the  computational  algorithm  that  follows  from  equations  (2.9)-(2.12) . 

To  conclude  this  section  it  should  be  noted  that  in  some  cases, 
which  arise  subsequently,  T(X)  will  involve  unknown  parameters.  In 
such  instances  values  that  may  be  used  for  these  parameters  may  be 
available  from  previous  or  pilot  studies,  prior  knowledge  or,  perhaps, 
from  a  null  hypothesis  that  is  to  be  tested.  Of  course  if  the  para¬ 
meters  are  of  a  "location-scale"  variety,  i.e.,  TQ(u)=*c+dW(u)  for 
some  known  function  W,  an  optimal  value  for  U  can  still  be  determined 
since  | |TQ-P^(TQ) j |  -  |d|  ||w-PyW||.  Although  the  computation  of 

the  x^'s  may  still  require  knowledge  of  c  and  d,the  optimal  U's  will 


still  be  useful  in  analyzing  the  robustness  of  (2.5)  to  incorrect 
guesses  for  the  parameter  values  (c.f.  Kulldorff  (1961,  Sections  2.7, 
8 . 5  and  9.4)). 


3.  Optimal  stratification  and  grouping.  In  this  section 
several  problems  of  optimal  stratification  and  grouping  are  consi¬ 
dered  that  are  related  in  the  sense  that  all  can  be  formulated  as 


piecewise  constant  approximation  problems  for  the  quantile  function. 

In  each  case  the  results  of  Section  2  provide  techniques  for  both 
exact  and  approximate  solutions.  We  begin  with  an  optimal  strati¬ 
fication  problem. 

For  a  random  variable  X  with  continuous  p.d.f.,  f,  Dalenius  (1950) 
considered  the  problem  of  dividing  the  range  of  X  into  strata,  with 
boundaries  a  ■  x^ < . . . <x^+1”b ,  so  as  to  minimize  the  variance  of  the 
usual  estimate  of  the  mean  from  a  stratified  random  sample  of  size  N, 

X  -  2i>tl(F(xi)-F(xi_1))Xi  where  is  the  sample  mean  for  the  ith 
stratum.  Using  the  notation  of  Section  2  the  mean  and  variance  cf 


the  ith  stratum  (x^^jX^]  can  be  written  as 
mi  *  ^ui~ui-l^  ^ 


(3.1) 


,-lfUi 


(u^  -  ^i_^)  /  (Q(u)-mi)  du. 


(3.2) 


For  proportional  allocation,  where  the  number  of  elements  taken  from 
the  ith  stratum  is  N(u^-u^_^) ,  it  follows  from  Dalenius  (1950)  and 
the  previous  section  that  the  variance  of  X  is 

V(X)  -  N'1I^(ui-ui_1)a^  -  N-1t/Q(u)2du  -  E^O^-u^mJ] 

-  N_1| )Q  -  PyQ||2  . 


(3.3) 
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Thus,  selecting  strata  to  minimize  V(X),  under  proportional  allocation, 
is  equivalent  to  finding  the  best  set  of  knots  for  L^fO.l]  piecewise 
constant  approximation  of  the  q.f.  and  ve  are  now  in  a  position  to 
apply  results  from  Section  2.  Consequently,  for  QeC1[0,l]OL2[0,l] 
with  Q'  >  0  on  [0,1]  optimal  spacing  candidates  may  be  found  as 
solutions  to 

S1(U)  -2Q(ui)  -  m  -  mi+1  -  0,  i  -  1 . k,  (3.4) 

using  (2.10)  -  (2.12),  with  TQ  »  Q,  and  Newton's  method.  Equations 

(3.4)  were  first  considered  in  the  context  of  optimal  stratification 

by  Daleniun  (1950).  An  approximate  solution  to  these  equations  is 

-1, 


provided  by  u^  =  H  (i/k+1),  the  (k+l)-tiles  of  the  density 
h(u)  -  {Q'(u)}2/3//1{Q’(s)}2/3ds. 


(3.5) 


Examples  of  this  approximate  solution  are  u^  =  i/k+1  for  the  uniform 

i  3 

distribution  (Q(u)*u) ,  u^-l-d-j^-)  for  the  exponential  distribution 
(Q(u)  »  -log(l-u))  and  u^  =  (i/k+l)3//2  for  F(x)  =  x2  on  [0,1]  (Q(u)=u1^2) 
Whereas  all  three  distributions  satisfy  the  hypotheses  of  Theorem  2 
the  latter  two  do  not  satisfy  the  continuity  conditions  on  Q'  imposed 
by  Theorem  1, meaning  we  are  not  immediately  justified  in  using  (3.4) 
to  compute  optimal  strata  for  random  variables  with  these  distributions. 
This  problem  will  now  be  considered  in  more  detail. 

From  Theorem  1,  equations  (3.4)  will  have  a  unique  solution  if 
Q  and  Q'  *  1/fQ  are  continuous  with  l/fQ>0  on  [0,1]  and  -logfQ  concave 
on  (0,1).  The  latter  two  conditions  are  usually  satisfied.  However, 
for  most  laws  fQ(0)  ■  f Q (1)  *  0  and  Q  is  finite  at  0  and  1  only  for 
laws  having  a  finite  range.  As  a  result,  Q  and  Q'  frequently  will 
not  satisfy  the  continuity  conditions  at  0  and  1.  We  now  illustrate 
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how  the  approach  of  Section  2  can  be  modified  to  deal  with  such 
difficulties  for  certain  types  of  laws.  The  arguments  follow  those 
by  Barrow,  et  al  (1978)  and  Chow  (1982).  The  basic  approach  will 
be  indicated  here  with  the  interested  reader  referred  to  either  of 

these  two  papers  for  further  details.  Although  the  discussion  which 
follows  will  be  phrased  in  terms  of  approximation  of  Q  the  results 
will,  of  course,  apply  to  TQ  in  general  upon  appropriate  modifica¬ 
tion. 

Now  assume  that  Q  is  not  piecewise  constant  for  any  k  and  is 

an  element  of  C^(0, 1) [0, 1 ] .  It  then  follows  from  Chow  (1982), 

or  may  be  verified  directly,  that 

1 

Q(u^)  -  (PyQ) (u^-)  =  (u1~u1_1)/sQ' (s(ui-u1_1)+ui_1)ds,  i*2,...,k  (3.6) 

and 


Q(u1)-(PUQ) (u4+)  =  -(ui+1-ui)/1(l-s)QI (s(ui+1-ui)+ui)ds,i=l, . . . ,k-l,  (3.7) 

It  is  now  assumed  that  (3.6)  and  (3.7)  are  well  defined  at  i  ■  1  and  k 

1/2 

respectively.  Note  that  this  allows  for  cases  such  as  Q(u)  =  u 

and  -log(l-u) .  Using  the  local  nature  of  piecewise  constant  approxi¬ 
mation,  it  is  easily  shown  by  differentiating  the  error  functional  on 
the  subintervals  that  for  optimal  U 

|Q(ut)  -  (PuQ)(ui-)  |  =  |Q(u±)  -  (PyQKu^)!,  i  =  l,...,k.  (3.8) 

If  Q'  >0  on  (0,1)  we  may  use  (3.6)  and  (3.7)  to  rewrite  (3.8)  as 

S1(U)  -  (ui-ui_1)/1sQ' (s(ui-ui_1)  +  u1_1)ds  (3.9) 

1 

-(ui+1-u1)/  (l-s)Q' (s(ui+1-u1)+ui)ds  =  0,i“l, . . . ,k, 

which  is  precisely  (3.4).  Consequently,  the  necessary  conditions  (3.4) 
still  hold  under  these  weaker  assumptions.  If,  in  addition,  the  Jacobian 
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matrix  of  S(U)  -  (S^(U) , . . . ,S^(U))  is  positive  definite  then  argu¬ 
ments  in  Section  4  of  Barrow,  et  al  (1978)  may  be  used  to  show  that 
(3.9)  has  a  unique  solution.  In  particular,  it  follows  from  results 

in  Section  3  of  Barrow,  et  al  (1978)  that  if  logQ’  is  concave  the 

k  k 

solution  to  (3.9)  is  unique  provided  3S^/3uj  and  3S^/3uj 

1/2 

are  positive.  To  illustrate  the  use  of  this  result  let  Q(u)  -  u 

lc 

or  logu  and  observe  that  we  need  only  show  3S^/3Uj  >  0  as  both 

functions  are  continuously  differentiable  at  1.  From  (3.9)  we  have 


Z.  3S ,/3u  =*  /  sQ*(su  )ds  +  u  /  s  Q"(su.)ds 

3*1  1  J  o  1  i0  1 

1 

-  (u2-u1)/(1-s)Q"(s(u2-u1)  +  u1)ds 

1/2 

which  is  found  to  be  positive  for  both  u  and  logu.  Thus,  there 

2 

exists  unique  optimal  strata  boundaries  for  the  distribution  F(x)=x 
and, from  symmetry  considerations,  for  the  exponential  distribution 
as  well. 

The  approximate  solutions  obtained  from  h  in  (3.5)  (and  others 

that  are  asymptotically  equivalent)  have  also  been  studied  by  Ekman 

(1960,  1963,  1969)  and  Sarndal  (1961,  1962).  Ekman  (1963,  pg.  78) 

imposes  the  conditions  that  f*  and  f"  exist  and  are  continuous  over 

k  -1 

any  finite  interval  and  that  f^(z)  =  z  f(z  )  exists  for  some  k>3  for 

which  f^  and  f'^  also  exist  and  are  finite  for  some  z  in  a  neighborhood 

of  0  and  0  <  f^  <  ®.  Although  comparison  is  somewhat  difficult  these 

conditions  seem  more  restrictive  and  more  difficult  to  check,  for 

most  laws,  than  the  conditions  on  Q  required  in  Theorem  2.  More 

Immediate  comparisons  can  be  made  with  Sarndal  (1961)  who  requires 

Q  to  have  four  bounded  continuous  derivatives.  It  is  also  of  interest 

to  note  that  Ekman  (1963)  shows  that,  for  optimal  strata  boundaries, 

lim  k^  inf  NV(X)  *  [/()'  (u)^^du]^/12,  provided  f  is  continuous  and  Q 
k-*»  UeDk  0 
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is  square  integrable.  Quantile  based  conditions  for  this  result  to 

hold,  such  as  Qel^tO.l]  with  Q'  integrable,  can  be  obtained  from 

Theorem  1.1  of  Burchard  and  Hale  (1975). 

Under  optimal  (or  Neyman)  allocation  the  variance  of  X  is 
k+1 

not  (3.3)  but  rather  ^ui_ul-l^al*  Approximate  solutions  to 

the  optimal  stratification  problem  in  this  case,  similar  to  those 

discussed  previously,  have  been  studied  by  Dalenius  and  Hodges  (1957, 

1959),  Ekman  (1959a,  b,  c,  1960,  1963),  Sethi  (1963)  and  others. 

They  use  stratification  points  that  are  selected  (or  are  asymptotically 

1/2 

equivalent  to  those  selected)  from  the  density  proportional  to  f(x) 

Making  the  change  of  variable  X=Q(u),  this  is  recognized  as  equivalent 

1/2  1  1/2 

to  selecting  spacings  according  to  h(u)  -  Q'(u)  //  Q'(s)  Zds  which 

0 

is  the  same  density  one  would  use  in  knot  selection  for  piecewise 
constant  1^(0, 1]  approximation  of  Q  (c.f.  Pence  and  Smith  (1982)). 

This  has  the  interesting  consequence  of  establishing  an  asymptotic 
equivalence  between  variable  knot  L^[0,1]  piecewise  constant  approxi¬ 
mation  of  Q  and  optimal  strata  selection  under  Neyman  allocation. 

Several  other  authors  have  considered  problems  that  are  formally 
equivalent  to  the  problem  of  optimal  stratification  with  proportional 
allocation.  A  problem  of  grouping  to  "minimize  loss  of  information" 
considered  by  Cox  (1957)  utilizes  a  loss  function  whose  expectation 
is  proportional  to  (3.3)  and  a  "mixing  problem"  considered  by 

2 

Ekman  (1969)  can  also  be  formulated  as  minimization  of  |  |  Q— Py.Q  |  [ 
in  a  particular  instance.  Under  certain  restrictions  (see  Chan  and 
Adatia  (1981))  a  three  group  regression  problem  discussed  by  Gibson 
and  Jowett  (1975)  provides  an  estimate  of  a  regression  coefficient 
whose  variance  is  proportional  to  [1^ (u^u^  l^mi^~  “MIqII  —  |  | Q— P^jQ  |  |  ] 
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Consequently,  the  results  presented  in  this  section  are  directly 

applicable  to  all  these  problems. 

A  grouping  and  combining  problem  posed  by  Rade  (1963)  can  also 

be  formulated  as  piecewise  constant  approximation  of  Q.  Given  an 

additive  quality  variable  X  with  zero  mean  and  symmetric  density  f, 

and  a  grouping  -»  -  x_(k+1)  <  • • •  <  x_x  <  xQ  <  x1  <  . . .  <  xfc+1  -  », 

where  xQ  -  0  and  x  -  -x^,  an  observation  on  X  that  falls  between 

x  ^  and  x  ^  ^  is  paired  with  one  from  the  interval  (x^ x^) • 

The  objective  is  to  choose  a  grouping  that  maximizes  the  proportional 

increase  in  variability  from  pairing  values  at  random  over  that  for 

the  grouped  pairing  scheme.  This  proportional  increase  in  variability 
k+1  2 

is  shown  to  be  E  ^(u^-u^  i^mi  f rom  which  we  see  that  the  problem 
is  equivalent  to  optimal  knot  selection  for  piecewise  constant  L  [.5,1] 
approximation  of  Q.  The  results  of  this  section  are  now  applicable 
after  the  obvious  modifications.. 

4.  Optimal  grouping  and  spacing.  The  problems  considered  in 
this  section  can  all  be  formulated  as  piecewise  constant  approxima¬ 
tion  of  fQ  or  the  product  of  fQ  and  Q,  fQ*Q.  We  begin  by  considering 
a  problem  of  optimal  quantile  selection  for  location  or  scale  para¬ 
meter  estimation. 

Let  X^,...,Xjj  denote  a  random  sample  from  a  distribution  of  the 
form  F( — ~”)  where  u  and  o  are  respectively  location  and  scale  para¬ 
meters  and  F  is  a  known  distributional  form  with  associated  p.d.f. 
f  and  q.f.  Q.  Define  the  sample  quantile  function  by 

Q(u)  “  ^(j)’  <  u  —  ij  »  J  "  1,...,N  ,  (4.1) 

where  X^j  is  the  jth  sample  order  statistic.  It  is  frequently 
convenient  to  estimate  u  or  a  by  linear  functions  of  k  <  N  sample 
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quantiles.  For  a  given  UeD^  such  estimators  have  the  form 

+  E^jbjQ(u  )  w^ere  explicit  formulae  for  asymptotically  (as 
N  -*»)  optimal  weights  have  been  given  by  Ogawa  (1951).  The  esti¬ 
mators  of  y  and  o  that  result  from  Ogawa* s  weights  are  called  the 
asymptotically  best  linear  unbiased  estimators  (ABLUE's)  and  will 
be  denoted  here  by  y(U)  and  a(U) .  When  a  is  known,  y(U)  has 
asymptotic  relative  Fisher  efficiency  (ARE) 

ARE(y (U) )  -  I(y)-1^  tfQ^-fQ^  ^  ^/(Uj-u^)  (4.2) 

1  2 

where  I(y)  *  /[(fQ)'(u)]  du  and  we  assume  that  fQ(0)»fQ(l)  *  0. 

0 

Similarly,  when  y  is  known  and  fQ(0)Q(0)  -  fQ(l)Q(l)  *  0, 

ARE(a(U))  =*  I(o)-1Z^tfQ(ui)Q(ui)-fQ(ui_1)Q(ui.1)]2/(ui-ui.1)  (4.3) 

1  , 

where  1(a)  =  /[ (fQ*Q) ' (u) ]  du.  The  ARE's  of  both  estimators  are 
0 

functions  of  U  and,  consequently,  U  should  be  chosen  to  maximize  one 

of  (4.2)  or  (4.3)  thereby  obtaining  a  best  k-quantile  subset  for 

estimating  the  parameter  of  interest.  This  problem  of  optimal 

spacing  selection  has  received  considerable  attention  in  the 

literature  (see  Cheng  (1975)  and  Eubank  (1981)  for  references). 

Maximizing  (4.2)  (or,  equivalently,  minimizing  l-ARE(y(U))) 

is  seen  to  follow  the  pattern  in  Section  2  by  taking 

mi  -  (u^-u  ^)  1/Ui  (fQ)'(u)du 

Ui-1 

so  that  TQ  -  (fQ)'.  For  scale  parameter  estimation  the  analogous 
result  follows  with  TQ  -  (fQ*Q)r.  Therefore,  the  problem  of  optimal 
spacing  selection  for  y(U)  and  a(U)  is  equivalent  to  optimal  knot 
selection  for  piecewise  constant  LjtO.l]  approximation  of  (fQ)  '  and 
(fQ'Q)’  respectively. 

Equations  (2.9),  in  this  setting,  have  been  utilized  to  compute 
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optimal  spacings  for  a  variety  of  distributions  (c.f.  Chan  and 
Kabir  (1969)  and  Cheng  (1975)).  A  general  approach  to  this 
problem,  including  a  computational  algorithm  using  (2.9)-(2.12) , 
is  discussed  in  Eubank,  Smith  and  Smith  (1982).  The  conditions 
for  uniqueness  of  optimal  spacings  provided  by  Theorem  1  were 
given  previously  in  Eubank  (1981)  and,  for  y(U),  require  that 
(fQ)’  and  (fQ)"  be  continuous  with  (fQ)"  of  one  sign  on  [0,1] 
and  log(fQ)"  (or  log-(fQ)"  as  appropriate)  concave  on  (0,1). 

Results  for  o(U)  follow  similarly.  We  note  that  these  restric¬ 
tions  can  be  weakened,  as  in  Section  3,  to  deal  with  distributions 
such  as  the  Weibull,  F(x)  ■  1  -  exp{-xV},  x,  v  >  0,  for  which 
(fQ*Q)"(u)«v(l-u)  1  and,  hence,  does  not  satisfy  the  stated 
continuity  conditions.  These  uniqueness  conditions  are  to  be 
compared  with  those  imposed  by  Rhodin  (1976)  who  requires  that 
fQ  and  fQ*Q  have  three  continuous  derivatives  and  also  satisfy  a 
concavity  condition.  As  an  approximate  solution  one  may  use 
spacings  selected  according  to  the  densities 

!|  (fQ)”(u)  |2^3/ /  i (fQ) " (s)  |2/,3ds  ,  a  known, 

0  (4.4) 

|(fQ-Q)"(u)!2/3//1|(fQ-Q),,(s)|2/3ds,  u  known, 

0 

examples  of  which  can  be  found  in  Eubank  (1981)  .  These  densities 

were  also  proposed  by  Sarndal  (1961,  1962)  under  the  condition 

that  fQ  and  fQ*Q  have  four  continuous  derivatives. 

Now  suppose  that  one  has  two  random  samples  Z ^,...,Z  and 

Y,,...,Y  ,  with  d.f.'s  F  and  G  respectively,  and  wishes  to  test 
i  m 

the  hypothesis  G(x)  *  F(x)  against  the  alternative  G(x)  *  F(x-y) . 


If  (N  ■  n+rn)  denotes  the  combined  ordered  sample  and 

Q  Is  defined  as  In  (4.1),  this  hypothesis  may  be  tested  using  a  rank 

test  based  on  a  statistic  of  the  form 
1 

R „  -  /  J(u) 6(Q(u))du 
"  0 

where  J(u)  -  CjN,  <  u  <_  and  5(Q(u))  -  1,  if  Q(u)  is  a  Z 

observation  and  is  zero  otherwise.  Gastwirth  (1966)  shows  how  J 

may  be  chosen  to  obtain  the  asymptotically  most  powerful  rank  test 

(a.m.p.r.t.)  and,  given  a  spacing  U£D^,  also  considers  group  rank 

tests  based  on  statistics  of  the  form 
k+1 

VU)  "  E1-l  CJ  3  5«Ku))du. 

Uj-1 

It  is  then  shown  that,  for  optimal  c  ^  ,  the  ARE  of  the  resulting 
asymptotically  most  powerful  group  rank  test  (a.m.p.g.r.t.)  to  the 

a. m.p.r.t.  is  precisely  (4.2),  For  testing  G(x)  «  F(x)  against 
the  alternative  G(x)  -  F(x/o)  the  analogous  result  is  that  the 
asymptotic  efficiency  of  the  a.m.p.g.r.t.  relative  to  the  a.m.p.r.t. 
is  (4.3).  Thus  previous  comments  on  optimal  spacing  selection  for 
u(U)  and  cr(U)  Including  conditions  for  uniqueness,  the  computational 
algorithm  in  Eubank,  Smith  and  Smith  (1982)  and  the  densities  (4.4) 
apply  to  the  problem  of  optimal  group  selection  for  the  a.m.p.g.r.t. 
as  well. 

Given  a  grouping  a  -  xQ  <  x^  <  . . .  <  x^+1  -  b,  Kulldorff  (1958a, 

b,  1961)  considered  the  problem  of  maximum  likelihood  estimation  of 
a  parameter,  9,  when  the  available  information  from  a  random  sample 
of  size  N  consists  only  of  the  number  of  values  falling  in  each 
interval  (x^_^,x^],  i  -  l,...,k+l.  Let  F(x;9),  denote  the  common 
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d.f.  for  the  sample  elements  with  associated  p.d.f.,  q.f.  and 
density-quantile  function  f(x;9),  Q(u;9)  and  fQ(u;9)  *  f(Q(u;9);9). 

Then,  under  regularity  conditions,  it  is  shown  that  the  asymptotic  (as  N-**) 
variance  of  the  maximum  likelihood  estimator  (MLE)  is 

V(9)  -  (ui‘u1_l)  (3?  1°g(u1‘Ul-1))2}’1  (4.5) 

where  u^  *  F(x^;9).  Now 

3  -ir  3Q(u.;9) 

TgiogCUi-Ui.].)  *  ^Wi*  [fQCn±;0) — jg — 

3Q(u  ;9)  -» 

-  fQ(ui_re>  — ST - J  . 

which  follows  from  the  identity  ■  -fQ(u;9)  t 

36  x*Q(u;  9)  96 

so  Theorems  1  and  2  are  applicable  with  TQ(u)  *  -|^-[fQ(u;9)  . 

When  9  is  a  location  or  scale  parameter  (4.5)  is,  apart  from  constant 
multiples,  identical  to  (4.2)  and  (4.3)  respectively  so  that  selecting 
optimal  spaclngs  for  the  ABLUE's  and  MLE's  of  u  and  a  are  equivalent 
problems.  However,  in  the  latter  case  the  x^s  must  also  be  determined, 
which  requires  knowledge  of  9.  Kulldorff  (1958a,  b,  1961)  has  investi¬ 
gated  the  solutions  to  equations  (2.9)  for  the  normal  and  exponential 
distributions  and  found  that,  in  these  cases,  V(9)  behaves  somewhat 
robustly  with  respect  to  incorrect  guesses  for  9. 

An  insightful  paper  by  Adatia  and  Chan  (1981)  investigates  the 
question  of  when  the  problems  of  optimal  quantile  selection  for 
the  ABLUE,  optimal  stratification  with  proportional  allocation  and 
optimal  grouping  for  the  MLE's  of  p  and  a  are  equivalent.  It  now 
follows  from  the  work  in  Section  2  that  these  problems  are  equivalent 
if  we  are  approximating,  in  each  case,  linear  combinations  of 


the  same  function.  For  Instance,  for  location  parameter  estimation 
these  three  problems  are  equivalent  if 


(fQ)'(u)  -  c  +  dQ(u) .  (4.6) 

For  scale  parameter  estimation  the  analogous  condition  is 

(fQ*Q)'(u)  -  c  +  dQ(u).  (4.7) 

Conditions  (4.6)  and  (4.7)  are  the  quantile  domain  version  of  the 
principal  condition  in  Theorem  5  of  Adatia  and  Chan  (1981)  (they 
also  provide  conditions  under  which  (4.6)  and  (4.7)  are  both  necessary 
and  sufficient).  If  one  considers  location  parameter  estimation  for 
distributions  having  support  on  the  entire  real  line  (4.6)  gives  a 
differential  equation  in  f  (namely  f'  -  (c  +  dx)  f  »>  0)  for  which 
the  normal  distribution  is  the  only  solution.  Similarly,  for  scale 
parameter  estimation  and  distributions  having  support  on  (0,®)  the 
only  solution  to  (4.7)  is  the  gamma  family  of  distributions.  In 
particular,  it  follows  from  this  that  all  the  problems  considered,  up 
to  this  point,  are  equivalent  in  the  special  case  of  a  normal  or  gamma 
distribution.  This  result  will  also  be  found  to  hold  in  the  remaining 
section. 

Other  problems,  related  to  those  in  this  section,  have  been 
considered  by  Ogawa  (1952),  McClure  (1980a, b),  Koutrouvellis  (1981) 
and  Saleh  (1981) .  There  is  also  a  relationship  between  the  problem 
of  optimal  quantile  selection  for  the  ABLUE's  and  regression  design 
for  time  series  with  Brownian  motion  or  Brownian  bridge  errors  that 
is  explored  in  Eubank  (1981)  and  Eubank,  Smith  and  Smith  (1982). 

5.  Other  applications.  In  this  final  section  several  other 
problems  are  considered  some  of  which  have  a  bivariate  nature.  We 
begin  with  another  stratification  problem. 
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5.1  Optimal  stratification.  In  sampling,  the  variable  that  is 
used  for  the  purpose  of  stratification  usually  differs  from  the  response 
variable.  Let  X  denote  the  variable,  having  p.d.f.  and  q.f.  denoted 
f  and  Q,  upon  which  we  intend  to  stratify.  Assuming  that  X  is  related 
to  the  response  variable  Y  by 

Y  -  ^(X)  +  e  ,  (5.1) 


where  e  is  a  zero  mean  random  variable  that  is  independent  of  X, 

the  problem  we  now  consider  is  how  to  select  strata  boundaries, 

a  -  Xq  <  x^  <  ...  <  »  b,  which  minimize  the  variance  of  Y, 

the  mean  response  from  a  sample  of  size  N  selected  with  proportional  allocation. 

Let  UyQ(u)  »  Uy(Q(u))  and  define 
-1 

mi  *  (ui“ui  i )  /  iy)(u)du  (5.2) 

Ui-1 

where  u^  is  given  by  (2.1).  The  variance  of  Y  is  then  readily 
verified  to  be 


V(Y)  =  N 


■lr  2  . 
ae  + 


/  UyQ(u)2du  -  I^(ui-ui_1)m2] 


(5.3) 


-  N_1[o*  +  HwyQ  -  PyCv^Q)  II2] 


where  is  the  variance  of  e.  Consequently,  the  problem  of  optimal 
strata  selection  under  model  (5.1)  is  equivalent  to  free  knot  piece- 
wise  constant  approximation  of  TQ  *  u^Q.  Under  the  conditions  of 
Theorem  1  a  U  satisfying  a  necessary  condition  for  optimality  can  be 
obtained  as  a  solution  to  the  equations 

2uYQ(ui)  -  m±  -  mi+1  -  0  ,  i  -  1 . k,  (5.4) 


which  have  also  been  considered  by  Dalenius  and  Gurney  (1951) 

and  Herlekar  (1967) .  We  now  observe  that  their  solution  is  unique 

if  log(UyQ)'  is  concave.  An  approximate  solution  is  provided  by  the 


density  proportional  to  |  (y^Q)  '  (u)  |  . 

In  the  event  that  UyQ(u)  -  c  +  dQ(u),  i.e.,  Y  has  a  linear 
regression  on  X,  it  follows  immediately  from  comments  in  Section  2 
that  the  problem  of  strata  selection  reduces  to  the  problem  of 
approximating  Q  treated  in  Section  3. 

A  similar  problem  that  concerns  optimal  grouping  and  combining 
has  been  considered  by  Rade  (1963).  The  problem  is  essentially  the 
same  as  the  one  discussed  in  Section  3  except  that  now  the  grouping 
is  to  be  performed  on  an  auxiliary  variable  X  which  is  correlated  with 
the  quality  variable  Y.  The  selection  of  optimal  groupings,  in  this 
case,  is  found  to  be  a  best  L2[.5,  1]  approximation  problem  for  the 
"conditional  mean",  y^Q, which  parallels  the  result  obtained  in  Section 
3  for  the  one  variable  case.  We  note  in  passing  that  the  grouping 
problem  of  Cox  (1957)  and  the  "mixing  problem"  considered  by  Ekman  (1969) 
have  bivariate  extensions  that  can  also  be  analyzed  using  the  techniques 
presented  here. 

5,2  Optimal  chi-squared  test  for  homogeneity.  Let  X  be  a  random 

variable  having  p.d.f.  f  and  q.f.  Q.  For  a  continuous  density,  g, 

2 

Pearson's  4>  is  defined  by 


where  integration  is  over  the  range  of  X  and  gQ(u)  -  g(Q(u)).  We 

2 

assume  that  (5.5)  is  finite  and  ijote  that  $  provides  a  measure  of  the 

distance  between  f  and  g.  If  the  range  of  X  is  now  partitioned  into 

contiguous  subintervals  having  boundaries  a  ■  Xq  <  x^  <  . . .  <  x^+1  “  b 

it  then  follows  from  Lancaster  (1969,  pg.  86)  or  Bofinger  (1975)  that 

2 

the  resulting  grouped  $  can  be  written  as 
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,2  -k+1  ,  .  2 

>U  *  Zi-1  (ui“ui-l)ml 


(5.6) 


Bofinger  (1975)  also  notes  that  a  spacing  selected  to  minimize  (5.8) 

will,  under  certain  conditions,  maximize  the  non-centrality  parameter 

of  a  chi-squared  test  for  the  equality  of  the  distributions  corresponding 

to  f  and  g,  thereby  providing  a  best  chi-squared  homogeneity  test.  By 

taking  TQ  *  gQ/fQ,  optimal  and  asymptotically  (as  k-*»>)  optimal  groupings 
2 

for  <J>y  can  now  be  obtained  using  the  results  in  Section  2. 

If  g(x)  -  f (x; 0)  and  f(x)  *  f(x;9g)  for  9  close  to  B^then,  with 
notation  as  in  Section  4,  we  may  use  the  approximation  (see  Lancaster  (1969, 
pg.  89)  or  Bofinger  (1975)) 

gQ<«;9>/fq(u;e0)  ;  1  -  •  (5.9) 

2  2 

The  minimization  of  <(>  -  4>y  then  reverts  to  the  problem  of  approximating 

-jj^[fQ(u;9)  f°r  c^at  was  Previously  considered  in  Section  4. 

In  the  case  of  9  a  location  or  scale  parameter  and  f  a  normal  or  gamma 
density,  previous  comments  regarding  problem  equivalences  now  also  extend, 
approximately,  to  this  setting. 

5.3  Optimal  grouping  for  bivariate  distributions.  Let  (X,Y) 
denote  a  continuous  bivariate  random  variable  with  joint  p.d.f.  i(x,y) 
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and  marginal  densities  f  and  g,  for  X  and  Y  respectively,  that  are  asstaned 
to  satisfy  //[i(x,y)^f (x)g(y) 1  dxdy  <  «.  Also,  let  £(X)  and  n(Y)  denote 
the  first  canonical  variables  of  the  X  and  Y  space  (c.f.  Lancaster  (1969, 
Chap.  VI))  that  correspond  to  the  first  (i.e.,  largest)  canonical  corre¬ 
lation,  p.  If  X  is  grouped  as  in  previous  sections,  the  resulting  first 
canonical  variable  for  the  new  grouped  X  space  was  shown  by  Bofinger  (1970) 
to  be 

V4-1  2 

5U(X)  -  ®i/[Ej-l(uj“uj-l)mj1  (5.10) 

where 

u-i 

m^  =  ^ui-ui-l^  /  (CQ)(u)du  (5.11) 

Ui-1 

and  Q  is  the  q. f.  for  X.  The  correlation  between  £^(X)  and  n(Y)  was 
then  shown  to  be 

pu  -  p[Ii!i(urut-i)mi]1/2  •  (5*12) 

One  method  of  optimally  grouping  one  of  the  variables  in  a  bivariate 
distribution,  considered  by  Bofinger  (1970),  is  to  choose  a  spacing  that 
maximizes  p^.  In  view  of  (5.12),  this  problem  is  now  recognized  as 
equivalent  to  optimal  knot  selection  for  piecewise  constant  approxi¬ 
mation  of  TQ  *  £Q.  Consequently,  for  £Q  and  (£Q)’  continuous  with  (£Q)’>0 
on  [0,1]  a  UeD^  satisfying  a  necessary  condition  for  optimality  can  be 
computed  by  solving  the  system  of  equations 
2CQ(u±)  -  -  mi+1  -  0,  i  -  l,...,k, 

that  was  also  derived  by  Bofinger  (1970).  As  an  approximate  solution 
one  may  instead  use  spacings  selected  according  to 

h(u)  -  |(£Q)'(u)|2/3//1|(£Q)'(s)|2/3ds.  (5.13) 

0 

For  a  standardized  bivariate  distribution  the  canonical  variables 


are  Rermite-Chebychef f  polynomials  (Eagleson  (1964))  with  £{X)  «  X 


so  that  CQ«Q.  Consequently,  for  the  normal  distribution  the  problems 
of  optimal  startiflcation,  optimal  quantile  selection  (for  u(U)),  optimal 
grouping  for  the  MLE  of  u,  optimal  grouping  for  p^,  etc.  are  all  equiva¬ 
lent.  As  a  result  the  optimal  groupings  and  spacings  for  all  these 
problems  can  be  found  inKulldorff  (1963)  for  k  •*  1(1)10.  The 
asymptotically  optimal  spacing  given  by  (5.13)  is  found  to  be 
u^  =  $(/3"  $  ^(i/k+1)),  where  4>  is  the  standard  normal  d.f.,  with 
corresponding  grouping  /J  $  ^(i/k+1),  both  of  which  are  easily  computed 
from  tables  of  the  standard  normal.  There  are  also  bivariate  gamma 
distributions  having  polynomial  canonical  variables  (Kibble  (1941) , 
Eagleson  (1964))  so  that  similar  comments  regarding  the  equivalence 
of  previous  problems  obtain  for  these  laws.  In  this  Instance  optimal 
spacings  have  been  computed  by  Rhodin  (1975)  for  k  =  1(1)10  and  shape 
parameter  values  v  »  2(1)10.  Asymptotically  optimal  spacings  obtained 
using  (5.13)  have  been  given  by  SMrndal  (1964)  for  k  =  1(1)10  and 
v  -  2(1)5. 

In  the  case  when  both  margins  (i.e.  both  X  and  Y)  are  grouped, 
Bofinger  (1970,  1975)  proposed  an  approximate  solution  that,  in  our 
formulation,  is  equivalent  to  finding  best  free  knot  approximants  to 
5Q  and  nQ^  separately  where  is  the  q.f.  for  Y.  This  problem  is, 
therefore,  also  amenable  to  analysis  by  the  techniques  presented  in 
this  section. 
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